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Abstract — We numerically investigate a mean-field Bayesian 
approach with the assistance of the Markov chain Monte Carlo 
method to estimate motion velocity fields and probabilistic models 
simultaneously in consecutive digital images described by spatio- 
temporal Markov random fields. Preliminary to construction of 
our procedure, we find that mean-field variables in the iteration 
diverge due to improper normalization factor of regularization 
terms appearing in the posterior. To avoid this difficulty, we 
rescale the regularization term by introducing a scaling factor 
and optimizing it by means of minimization of the mean-square 
error. We confirm that the optimal scaling factor stabilizes the 
mean-field iterative process of the motion velocity estimation. 
We next attempt to estimate the optimal values of hyper- 
parameters including the regularization term, which define our 
probabilistic model macroscopically, by using the Boltzmann- 
machine type learning algorithm based on gradient descent 
of marginal likelihood (type-II likelihood) with respect to the 
hyper-parameters. In our framework, one can estimate both the 
probabilistic model (hyper-parameters) and motion velocity fields 
simultaneously. We find that our motion estimation is much better 
than the result obtained by Zhang and Hanouer (1995) in which 
the hyper-parameters are set to some ad-hoc values without any 
theoretical justification. 

I. Introduction 

Motion estimation in consecutive video-frames is one of 
the important techniques in image processing or computer 
vision community. The motion estimation is defined as esti- 
mating the motion velocity fields (vectors) of objects appearing 
in successive two (video) frames. In the research field of 
computer vision, the so-called Markov random fields (MRFs 
for short) have been used to solve the various problems 
concerning image processing such as image restoration HI, 
texture analysis and segmentation f2], f3|, |4|, super-resolution 
|6| and so on. The MRFs enable us to regularize the ill- 
posed problems in such a lots of subjects, and then, the original 
problem can be treated as combinatorial optimization problems 
under some 'soft' or 'hard' constraints. Actually, Zhang and 
Hanouer (1995) \7\ and Wei and Li (1999) |8| applied the 
MRFs approach with the assistance of the framework of 
Bayesian statistics to estimate the motion vector for a given 
two consecutive digital images. They also utilized the so-called 
mean-field approximation to carry out the extensive sums in 
the marginal probability of the posterior and showed that the 
steady states of the mean-field equations are one of the good 
candidates for the appropriate motion velocity fields. The same 



kind of the MRFs approach was implemented by making use 
of the DSP-based image processing board of SIMD (Single 
Instruction Multiple Data) machine by Caplier, Luthon and 
Dumontier (1998) L9J and Luthon, CapHer and Lievin (1999) 
|10|. They demonstrated that the task to estimate the motion 
velocity is actually carried out within a realistic time. 

In the study by Zhang and Hanouer (1995), they set the so- 
called hyper-parameters which specify the probabilistic model 
macroscopically to some ad-hoc values without any reason- 
able explanation. However, there is no theoretical (statistical) 
justification for such ad-hoc choices of parameters to estimate 
the appropriate motion velocity fields. Of course, the selection 
of hyper-parameters is dependent on a given set of consecutive 
video-frames and it is important for us to determine the 
hyper-parameters systematically under some statistical criteria 
so as to give a fine (if possible, an optimal) average-case 
performance of the motion estimation. 

Taking into account the above requirements from both 
theoretical and practical sides, from the view point of Bayesian 
statistics, we examine a mean-field approach with the as- 
sistance of the Markov chain Monte Carlo method (the 
MCMC for short) to estimate both motion velocity fields and 
hyper-parameters simultaneously in successive video-frames 
described by spatio-temporal MRFs. We find that mean-field 
variables in the non-linear maps diverge due to improper 
normalization factor of regularization terms appearing in the 
cost function. In order to overcome this difficulty, we rescale 
the regularization terms by introducing a scaling factor and 
optimizing it by means of minimization of the mean-square 
error We reveal that the optimal scaling factor stabilizes the 
mean-field iterative procedure of the motion velocity fields 
estimation. We next attempt to estimate the optimal values 
of hyper-parameters including the regularization term, which 
define our probabilistic model macroscopically, by using the 
Boltzmann-machine type learning algorithm based on gradient 
descent of the marginal likelihood with respect to hyper- 
parameters. In our framework, one can estimate both the 
probabilistic model (hyper-parameters) and motion fields si- 
multaneously. We show that our motion estimation is much 
better than the result given by Zhang and Hanouer (1995) in 
which hyper-parameters are set to some ad-hoc values without 
any theoretical explanation. 



This paper is organized as follows. In the next section [III 
we explain our general set-up to deal with the motion velocity 
estimation by means of spatio-temporal MRFs according to 
Zhang and Hanouer (1995). From the view point of Bayesian 
inference, we construct the posterior probability and introduce 
two kinds of estimations, namely. Maximum A Posteriori 
(MAP for short) and Maximizer of Posterior Marginal (MPM 
for short) estimations. In section |III1 we utilize the mean- 
field approximation to obtain the MPM estimate and derive 
the non-linear mean-field equations with respect to the motion 
velocity fields. As a preliminary, we demonstrate our mean- 
field approach by setting the hyper-parameters to the values 
chosen by Zhang and Hanouer (1995) and show that the mean- 
fields diverge leading up to a quite worse estimation of motion 
velocity in section |IV] To avoid this type of difficulty, we shall 
rescale the regularization term by introducing a scaling factor 
and optimizing it by means of minimization of the mean- 
square error. In section |V] we attempt to estimate the optimal 
values of hyper-parameters including the regularization term, 
which define our probabilistic model macroscopically, by 
using the Boltzmann-machine type learning algorithm based 
on gradient descent of the marginal likelihood with respect 
to hyper-parameters. In our framework, one can estimate 
both the probabilistic model (hyper-parameters) and motion 
velocity fields simultaneously. To proceed to solve the learning 
equations, we utilize two different ways to carry out the 
sums coming up exponential order appearing in the learning 
equations, namely, hybridization of mean-field approximation 
and MCMC, and simple MCMC. We find that average-case 
performance of our motion estimation is much better than the 
result given by Zhang and Hanouer (1995) in which the hyper- 
parameters are set to some ad-hoc values. The last section is 
summary. 

II. General set-up of motion estimation 
In this section, we briefly explain our model system. 

A. Spatio-temporal Markov random fields 

Let us define a single two-dimensional gray-scale image 
as a 'video-frame' by a;'^ — {xj,i G S}. S denotes a set 
of pixels in image and index i is related to a point in two- 
dimensional square lattice {x,y). Here we shall assume that a 
motion picture consists of successive static images (frames), 
namely, we distinguish each static image in the motion picture 
by time index r as a;''. When we compare the consecutive two 
static images, that is, x'^^^ and x'^, each pixel in a;^ might 
change its location with some 'motion velocity'. From this 
assumption in mind, we introduce velocity fields defined by 
= {dl , i e S}. Namely, for each i and for successive two 



video-frames, a constraint xj — x. 



should be satisfied. 



where 'index' dl is related to a single point (tij(i), (z)) in 
the two-dimensional vector field. In this paper, we consider 
that each component of the vector takes a discrete value and 
the range is limited as («)| < dmax — 1 = 5. It 

might seem that this range is extremely small in comparison 
with the range of the grayscales in images (from to 255) or 



image size (^ 30 x 30), however, if one attempts to construct 
a detection and alarming system for the dangerous state from 
'infinitesimal difference' of patient's breath in ICU (Intensive 
Care Unit), the limitation of the velocity fields to such a small 
range is rather desirable (reasonable). 

1} Line fields and segmentation fields: Obviously, it is 
impossible to determine the dT = G 5} uniquely from 

just only information about two video-frames x'^ and a;"^^^. 
To compensate this lack information, we introduce line fields 
and segmentation fields. 

The line fields guarantee the continuousness between arbi- 
trary two motion velocity fields for the nearest neighboring 
pixels and we assume that these two motion velocity fields 
might take similar values. Let us define these line fields by 
I — i) = {hi,Vi,hj,Vj) e S}. Here hi and Vi are 

labels to represent continuousness between velocity fields in 
the nearest neighboring (n.n. for short) horizontal and vertical 
pixels. In other words, we shall define 



hj = 



{(Ts for horizontally n.n. pixels are discont.) 

1 {(Ts for horizontally n.n. pixels are cont.) 

{(Ts for vertically n.n. pixels are discont.) 

1 {(Ts for vertically n.n. pixels are cont.) 



On the other hand, the segmentation fields are introduced 
to distinguish 'predictable areas' and 'unpredictable areas' in 
the motion velocity fields. Here 'unpredictable areas' means 
regions hided by some objects before they are moving to 
somewhere else. Thus, we naturally define the segmentation 
fields by s = {si\si = 0, 1} with 



(pixel i is predictable) 

1 (pixel i is unpredictable) 



B. Bayes rule and posterior probability 

In the previous subsections, we defined the motion picture 
as a series of successive static images by spatio-temporal 
Markov random fields. To determine the motion velocity fields 
uniquely, we also introduced the line and segmentation fields. 
Then, our problem is clearly defined as follows. 

Now, our problem is to infer the velocity vector field dT , 
fine field and segmentation field under the condition 
that two consecutive video-images a;^ and a;"^^^ are observed. 
For the above problem, we easily use the Bayes rule to 
obtain the posterior probability, which is a probability of 
= {(T , ^l^} provided that and a;^^^ are given as 



P(S^|a;^,a;^-^) 



P(a;^|S^,a;^-i)P(S^|a;^-i) 
P(a;^|S^,a;^-i)P(S^|a;^-i) 
P(a;^|S^,a;^-i)P(S^|a;^-i) 



(1) 



P(a;^|a;^-i) 

where we defined the sums appearing in the above formula by 



EsK- • •) ^ EdK- • ■) (■ • •) Er (• • •) with 

E(-) - n E (•••) 

i=l di=0 
N 

E(-) - n E(-) 



E(- 



i=l si=0,l 
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HE E( ) 

i=i ;ii=o,i fi=o,i 



(2) 
(3) 

(4) 



For the above posterior, we have the so-called Maximum A 
Posteriori (MAP) estimate by 

-EIjj^p = argmaxlogP(S^|a;^,a;^~i) (5) 

whereas, what we call Maximizer of Posterior Marginal 
(MPM) estimate is given by 

Kmpm - argmaxP(I][|a;^a:--i) = Q((I][)) (6) 



where we defined the marginal probability by 

P(S[|a;^a^"-l) = ^ Pi^E^lx^x^-^). (7) 

The average (• •) appearing in ^ is defined as (• ■ •) = 
El]^(' ' ■)P{'S^\x^ ,x'^^^) and Q{- ■ •) denotes a function to 
convert the expectation ^j^t S^P(S^|a;'^, a;"^^^) having a 
real number into the nearest discrete value. 

1) Likelihood function: The likelihood function appearing 
in the posterior P{x'^\'S,x'^^^) can be regarded as a proba- 
bilistic model to generate the next frame x'^ provided that the 
unknown fields S and the frame in the previous time x'^ are 
given. From now on, we omit the r-dependence of the fields 
because we consider the motion velocity fields for a given set 
of just only two consecutive video-frames. Then, we assume 
P(a;^|S,a;^-i) oc exp [-E^'^\x^\'E,x''-^)] where the cost 
function i;(i)(a;^|S, a;^-i) is given by 

E^'\x^\^,x^-') = 



(8) 



where N[i) means a set of nearest neighboring pixels around 
pixel i. The number of these pixels is |-/V(i)| = 4 (square 
lattice). The parameters a and a; are the so-called hyper- 
parameters which determine the probabilistic model macro- 
scopically. 

2) Prior probability: The prior probability P{Yl\x^) is a 
generating model of the fields for a given frame x^ and 



it is given by P{Yl\x^) oc cxp [-iJ^^^ (S|a;'^)] with 

+ A, ^ (l-;(i,j))(l-2<5(s,-,s,)) 
,:je7V(j) 

+ (9) 

i 

where we defined the norm || • • ■ || by 

and Xd, Xs, Xl, and Tg are also hyper-parameters which 
define the above probabilistic model macroscopically. 

3) Posterior: Then, the posterior P(S|a;^, a;"^^^), namely, 
the probability of the desired fields for a given set of two suc- 
cessive video-frames x'^,x'^^^ is constructed by the product 
of likelihood P{x^\'E,x''-^) and prior P(S|a;^-i), that is 
P(£|a;^,a?^-i) cx P{x^\T,,x''-^)P{'E\x^-^). 

By means of the cost function, we have 



P(S|a;^,a;^ 



oc exp 
= exp 



-£;(i)(a;^|S,a;^-i) -i;(2)(S|a;^) 
-E{nx\x^-^)]. (10) 

The total cost of the system, which is now defined by 

— logP(S|cc^, a;'^^^), is written as 



E{i:\x^,x^-')^^Y.(^-s.){x: 



+ Xd (l-2e~^-ll''-'^^ll')(l-/(z,j)) 
+ K (l-^(*,j))(l-2<5(s.-s,)) 



jj67V(j) 



(11) 



where the first term appearing in the right hand side of the 
above cost function is introduced to prevent pixel xj^^ at the 
location i from moving to the position i — dj where is quite far 
from i. The second term confirms the continuousness between 
velocity vectors for the nearest neighboring pixels and we 
easily find that the term is identical to the Hamiltonian (energy 
function) for the so-called dynamically diluted ferromagnetic 
Q-Ising model in the literature of statistical physics, that is to 
say, we have 

~ 2Xdl3d Y {^-lihj))\\d,-d,\\' 
i.jeN{i) 

+ d-independent const. (12) 

in the limit of /Sd 0. The third term in (fTTT l denotes a 
correlation between the line and the segmentation fields. The 
forth term represents a correlation between the line fields 



and the distance of pixels located in the nearest neighboring 
positions. The last term controls the number of non-zero 
segmentation fields and this term can be regarded as the so- 
called chemical potential in the hterature of statistical physics. 

III. Mean-field equations on pixel 

In the previous section, we constructed the posterior by 
making use of the Bayes rule. Therefore, we can use both MAP 
and MPM estimations by means of ^ and (|6]l, respectively. 
Here we should notice that the MAP estimate is recovered by 
means of 



self-consistent equations for V, 



^'i,MAP = lim Q{{T,^)f3), {■ ■ 

p— >oo 



with Pg(S|£c^,a;^-i) oc exp[-l3E{T,\x^ ,x^-^)]. From 
the above definitions, the MPM estimate is obtained by 
^i,MPM = Therefore, our problem now seems to 

be completely solved. However, the number of sums appearing 
in the expectation {■ ■ ■)p 

E(-) - E - E E - E 

X E ••• E (•••) (13) 

comes up to exponential order as Obviously, it is 

impossible for us to carry out the sums even for the system 
size is TV = 30 X 30 = 900 within a realistic time. 

Then, we use the mean-field approximation to overcome 
this type of computational difficulties. Namely, we rewrite the 
cost function by replacing the motion velocity fields with the 
corresponding expectations except for a single component of 
the fields. For instance, for say Si, we have the mean-field 
approximated cost function as follows. 

— X +TsSi 



E 



By using the same way as s,j, we have for di as 



E 



2a^ 



and obtain for as 

E^E"{l{z,j)) 
= Ad(l - 2e-'^'*ll<'''>"'-<''^>"''ll')(l - 1(1, j)) 



mf /„ \mf^^| K^^j) 



where 5{- ■ ■) stands for a delta-function. By means of the 
above approximated cost functions, one obtains the following 



Regarding the above self-consistent equations with respect to 
single-site averages as the following 'non-linear maps': 

(14) 
(15) 

= ^',mt^'\ {s.yf'\ {s,yf^'\. ■ .) (i6) 

we look for the steady states of the above maps which should 
satisfy the following convergence condition. 

> mf(t) 



(17) 



where e should be a small value, say e — 1.0 x 10^^. In 
general, a control parameter /3 is time-dependent variable as 
P{t) and the MAP estimate is obtained by controlling it as 
l3{t) — >^ cx) as i — >^ cx). On the other hand, the MPM estimate 
is constructed by setting the /3 to 1 during the above iterations. 

Generally speaking, the steady state (• ■ different 
from {■ ■ ■)i3 which is a solution of the self-consistent equations, 
however, it might assume that the (• • more likely 

to be close to (• • •)^ if the landscape of the cost is not so 
complicated like spin glasses ifTTll . 

IV. Preliminary : divergence of mean-fields 

To check the usefulness of the above procedure, we examine 
our mean-field algorithm to infer the motion velocity fields 
for a given set of two successive frames shown in Fig. [T] It 
should be noted that these two frames are artificially given 
and obviously, the true motion velocity vector fields are now 
explicitly provided for us to check the usefulness of our mean- 
field algorithm. 

Generally speaking in the Bayesian inference, setting the 
hyper-parameters appearing in the probabilistic model is one 
of the quite important tasks and here we examine the values 
(/3,a2,A<j,/3rf,a/,r,,A,) = (1,0.2,2.5,4,200,5,2) which 
were given ad-hoc by Zhang and Hanouer (1995). We find 




Fig. 1. Typical artificial images as a set of successive two video-frames. 
Image before moving (upper left) and image after moving (upper riglit). Tlie 
lower panel shows 'true' motion velocity fields for the situation given by 
the upper panels. In the above images, arbitrary grayscales are given to the 
segmentation areas and the region in which the objects are located. 



Fig. 2. The resultant velocity fields calculated by the choice of hyper- 
parameters {l3,a^,\a,l3d,ai,Ts,Xs) = (1,0.2,2.5,4,200,5,2). The ve- 
locity fields shrink to a few points with small lengths. 

that for the above choice of the hyper-parameter causes a 
divergence of the mean-fields such as (s^)™^ due to the 
regularization terms (1/2ct^)(1 — {si}'p^)ixj — ^i-d )"^ or 
— (si/2(T^)(a;[ — x'^Z^s )n>')^ which appear in the mean-field 
equations. We show the resultant velocity fields calculated by 
the above choice of hyper-parameters in Fig. |2] We find that 
the velocity fields shrink to a few points with small lengths 
and one apparently fails to estimate the true velocity fields. 

A. Optimization of scaling factor 

The origin of the above difficulty apparently comes from 
the divergence of these regularization terms evaluated for two 
extremely different values of pixels, for instance, say — 
255 and a;[_7j^ = which leads to e^^ss-")^ ^ oo. This fact 
tells us that there exist several serious cases (combinations of 
two consecutive video-frames) for which the ad-hoc hyper- 
parameter selection causes this type of divergence during the 
iteration of mean-field equations. 

To avoid the essential difficulty, we rescale the hyper- 
parameter cr^ as (T^ H' ^(T^ and optimizing the scaling factor /i 



from the view point of several different performance measures. 

1 ) Performance measures: We first introduce two different 
kinds of mean-square errors as average-case performance 
measures to determine the optimal scaling factor /i. 

1 ^ 

^i(^) ^ J:^T.(^-s^)\\d^^-d,\\' (18) 
1 ^ 

^ W^'' \\df'^~d^f (19) 

2—1 

where A^i = '^f^ii^ — Si), N2 = X^i^i ^^'^ should keep 
in mind that N = Ni + N2 holds, d^^-* is a true velocity field 
for a given set of two successive images shown in Fig.[T] Thus, 
the Di denotes the mean-square error defined by the difference 
between the true and the estimated velocity fields for zero 
segmentation regions. On the other hand, D2 is the mean- 
square error evaluated for non-zero segmentation regions. 

We also introduce the bit-error rate which is defined as the 
number of estimated pixels which are different from the true 
ones. Namely, we use 




where S^^y means a Kronecker's delta which is defined by 

^d'>,di = ^■ug(i),t;^(i)'^i;O(i),i'j,(0 (22) 

where S^^y is a 'conventional' Kronecker's delta. In Fig. [3] 



D 




Fig. 3. Behaviour of two kinds of the mean-square errors Di , D2 (upper 
left), the bit-en'or rates 5i , S2 (upper right) as a function of scaling factor /i. 
The lower panel shows the resultant velocity fields obtained by setting the 
optimal scaling factor fit ~ 21. The grayscale levels of the background and 
segmentation areas are Q = and Q = 40, respectively. The grayscale levels 
for the moving object are distributed within the range Q = 10 ~ 30. 

we plot the behaviour of two kinds of the mean-square errors 



Di, D2 (upper left), the bit-error rates 5i, 62 (upper right) as 
a function of scaling factor /i. The lower panel shows the 
resultant velocity fields obtained by setting the optimal scaling 
factor /i* ~ 21. From these panels, we find that the resultant 
velocity fields are very close to the true fields when we set 
the scaling factor appropriately. However, the ad-hoc choice 
of the other hyper-parameters {/3 , a"^ , Xd, Pd, Oii,Ts, Xs) should 
not be confirmed for the best possible velocity fields estimation 
for a given other set of the successive images. To make matter 
worse, in practice, we can use neither mean-square error nor 
bit-error rate because these quantities require the information 
about the true fields d^°^ (for instance, see the definition of 
Di). Therefore, we should seek some theoretical justifications 
to determine the optimal hyper-parameters. 

V. Maximum marginal likelihood criteria 

In statistics, in order to determine the hyper-parameters 
H = {/Lt, a, Xd, Xs,Ts, ai, f3d} of the probabilistic model which 
contains latent variables S = {s, d, I}, the so-called maximum 
marginal likelihood estimation is widely used. The marginal 
likelihood (the type-II likelihood) is defined by 



log^P(S|a;^,a;^ 
S 



(23) 



The equality holds if and only if H = Ho. Therefore, the in- 
equality ( l24b holds and this means that the marginal likelihood 
takes its maximum at the true values of the hyper-parameters. 
We use this fact to determine the hyper-parameters. In other 
words, the marginal likelihood is regarded as a 'cost function' 
whose lowest energy states might be a candidate of the true 
hyper-parameters . 

VI. Hyper-parameter estimation 

As we saw in the previous section, we should determine 
hyper-parameters so as to minimize the marginal likelihood. 
In this section, we attempt to construct the Boltzmann-machine 
type learning equations which are derived by means of taking 
a gradient of the marginal likelihood with respect to the hyper- 
parameters. 

A. Boltzmann-machine learning and its dynamics 

Let us define C(S) as a conjugate statistics for the pa- 
rameter H. Then, the Boltzmann-machine learning equation is 
obtained as 



x^,x^-^ 
= [^Fs^{x^,x^-')]^. 
> 0. 



[-Fsix\x^-')Ur^X^-r 

(25) 



82 



namely, the marginal likelihood is obtained by taking the 
sums of these latent variables in the (log) likelihood function. 
It should be noted that the above marginal likelihood is 
dependent on the 'input' two successive frames x'^ , x'^^^. We 
can easily show that the marginal likelihood is maximized at 
the true values of the hyper-parameters H°, namely, 

\~F^o{x^,x^-^)]^^^^^_^ > [-Fs{x^,x^-^)]^^^^^_,. 

(24) 

where we defined the observable data-average by [■ • •] = 

A. Kullback-Leibler information 

Taking into account the fact that the Kullback-Leibler (KL) 
information can not be negative, we can easily show the 
inequality (l24l i. 

Let us consider the KL information between the true proba- 
bilistic model Pct^ (ic'^, a;'^"^) and the model Pg(a;'^, a?'^"^). 
Then, from the definition of the KL information, we immedi- 
ately have 

KL{P^^\\P^) 
- E Ps„i^\x^-')^ogPsix^,x^-') 



j:^ci-E)p{i:\x\x^+^) 

E^pmx-,xr+^) 



Namely, we have 



dB 




It ^ 




dXd _ 




dt ~ 


Ese-^ 


dXs _ 




~dt ~ 


Ese-^ 


dai _ 




dt ~ 


Ese-^ 


dPd 




dt 


Ese-^ 


<m 


EslE.sJe-^ 


dt 


Ese-^ 



(26) 

(27) 
(28) 
(29) 

(30) 

(31) 
(32) 



where we defined 



K^/{d,,d,,l{hj)) 

^ J2 (l-2e-^''ll'^-'^^ll')(l-K*,j)) (33) 



B^/{l{i,j),d,,d,) 

^ (1-Z(z,j))||d. 

■i.]eN{t) 



-I3a\\di-dj\\ 



(34) 



B = l/2fia^ and U = (3E{i:\x'' ,x''~^). It should be noticed 
that the number of sums appearing in the right hand sides of 
the above equations comes up to exponential order and it is 
impossible for us to carry out them. 



B. Hybridization of mean-field approximation and MCMC 

To overcome this computational difficulty, we utilize the 
mean-field approximation. We first replace the variables S 
with the corresponding expectations expect for the variables 
appearing in the brackets {• • ■} in the right hand side of the 
learning equations. For instance, dB/dt = —dF^/dB now 
leads to 



dt 



(35) 



{U)T/,,,^BY,a-Si){xJ-xlllf 

i 

+ Xd Yl (l-2e-'5<^ll''-<''^)"'ll^)(l-(;(z,j))-*) 
i,jeN{i) 

+ Xs Yl (i-(Ki,i)r)(i-2<5(s,-(s,-r)) 

i,jeN{i) 



i,jeN{i 



(36) 



where we set (3 = 1, namely, we calculate the MPM estimate 
in our framework. Using the same way as the above, dXa/dt = 
-dF^/dXd leads to 



dt 



(37) 



AP/{di,dj,lii,j)) 



Y 



-0d\\di-dj\\'- 



(38) 



hjeN{i) 

i 

+ Xd Y (l-2e-'5'^ll'^*-'^^ll')(l-;(z,j)) 

+ K Y (i-Ki,i))(i-25((sir-(s,r)) 

+ Y i t^''% +TsY('^r' (39) 

i,jeN{i) ^ ' ^' » 
The equations for the other parameters are also rewritten as 

dt 



(40) 



As{l{i,j),s^,sj) 

= Y (l-Ui.i))(l-2<5(s,-s,)) (41) 

i 

+ Xd Y (l-2e-'5<^ll<*>"-<'^^>"'ll')(l-Z(i,j)) 
i,jeJV(i) 



+ A, ^ {l-l{i,j)){l-25{si-Sj)) 



Si 



(42) 



dai _ 2^hj {l^ideNji} {xi-x-y^j^ ^^^^ 



dt 



^ BY{^-{si)Wi-^iiU)f 

i 

+ Xd Y (l-2e- 
X (l-25((s,r-(s,r)) 

+ «^ E r^Hw+^«E^r'(44) 

i,jeN{i) ' * ^ » 
d/?, _ Ed„d,M,B^^il^,j),d.,d,)^^^^^^^''* 



dt 



Z-^dijdj,lij 



(45) 



^ (1 - j)) II - f e-Mdi-dj\\ (46) 



mf 



Ad ^ (l-2( 



-/3rf||di-dj||' 



+ A, ^ (l_Z(i,j-))(l_25((s,rf-(s,.r)) 



- x^y 



+ TsY^'i) 



(47) 



dt 



Es,e ^ 

+ Xd Y (l-2e-'^''l<'^->"'-<'^^>"l') 
i,ie-ZV(») 

X (i-(/(i,i)rf) 
+ Y 

i.jeN(i) 

X (l-(K^,j)r)(l-2<5(.s,-(,s,-r)) 



(48) 



i,jeN{i) ' 



+ E ^fr^+^^E^^ (49) 



where (• ■ ■)'"^ denotes a solution for the corresponding mean- 
field equation for a given hyper-parameter set at time t of the 
above learning equations : S'*-*. There still exist several (it is 
still hard for us to treat by hand) sums in the above learning 
equations and it might be possible for us evaluate the sums 
also by the expectations in terms of mean-field approximation. 
However, for such treatment, the learning equations looks for 
the hyper-parameters which minimize the cost function instead 
of the 'negative' marginal likelihood. From the view point of 
statistical physics, the marginal likelihood corresponds to the 
negative free energy and the mean-field treatment eliminates 
the entropy term. Therefore, if we rewrite the marginal likeli- 
hood by means of mean-field approximation, one obtains the 
negative cost function instead of the marginal likelihood. This 
means that we can not obtain appropriate hyper-parameters in 
terms of the maximum marginal likelihood criteria. For this 
reason, here we use the Markov chain Monte Carlo method 
(MCMC) to evaluate the sums appearing in the right hand 
sides of the learning equations. 

In order to implement the learning equations in computer, 
we discretize the derivative with respect to time t by means 
of Euler method such as 

B{t + At) = B{t) 

Thus, we set the initial values of hyper-parameters to S'"^ and 
solve the mean-field equations. Then, we insert the solutions 
into the right hand sides of the above learning equations and 
evaluate the sums such as (' ' ') '■^^ MCMC. After that, 
we update the hyper-parameters by the discretized learning 
equations and also update the time (step) as i n> t + 1. We 
repeat these procedures until each hyper-parameter converges 
to some finite value. Here we set At = 0.001. The initial 
values S'"' are the same values as those by Zhang and 
Hanouer (1995). 

In Fig. m we show the typical snapshots of velocity fields 
obtained by the method of hybridization of mean-field ap- 
proximation and MCMC at time t = (upper left)C t = IQ 
(upper right), t ^ 20 (lower left), t = 30 (lower right)C 
The case of t = {) corresponds to the result by Zhang and 
Hanouer (1995). From these panels, we find that our approach 
remarkably improves the performance of Zhang and Hanouer 
(1995). 

1} Average-case performance measures: To evaluate the 
average-case performance more quantitatively, we introduce 
the following two kinds of performance measures. The first 
one is defined by 

K = ^^(l-cos^O (50) 

i 

where 9i denotes an angle between the true velocity vector 
fields dP = and the estimated fields d ~ 

{di, • • • , (In}, that is explicitly given by cos^i = ■ di/ \\ 




Fig. 4. Typical snapsliots of velocity fields obtained by the method of 
hybridization of mean-field approximation and MCMC at time 4 = (upper 
left)C t = 10 (upper right), t = 20 (lower left), t = 30 (lower right)C The 
case of 4 = coiresponds to the result by Zhang and Hanouer (1995). 

Si nil di II . From the above definition, the K measures the 
error concerning mismatch of the direction of the estimated 
vector. 

Besides of the above K, we next introduce 

which measures the error concerning mismatch of the length 
of the estimated vector. We show the results in Fig.|5] We plot 
the average values of K and L over 20-independent runs for 
various different choices of the successive two video-frames. 
From these two panels, we find that these two errors decreases 
monotonically on average during the proposed learning pro- 
cedures. 

2} Computational cost measure: We next evaluate the 
computational cost. Obviously, our procedure requires us to 
take much longer time in comparison with the result by 
Zhang and Hanouer (1995) to obtain the results because for 
each Euler step, one needs to solve the mean-field equations 
and one should carry out the MCMC at the same time. In 
Fig. |6] we plot the CPU time CT [sec] as a function of 
system size N . The CPU time is measured in our PC {DELL 
Optiplex960DT7, Core2QuadQ9400 2.66 GHz). In the case 
of Zhang and Hanouer (1995), we measure the CT [sec] as 
CPU time to proceed 50-times mean-field iterations, whereas, 
in the case of our proposed procedure, the CT is defined as 
CPU time to take t = 50 in learning equations (for each of t, 
50-times mean-field iterations and 100 Monte Carlo step are 
done). From Fig. |6] we find that the difference between two 
procedures increases exponentially, however, this fact does not 
mean that our proposed procedure is computationally inferior 
to the ad-hoc choice by Zhang and Hanouer (1995) because 
they found the value by 'try and error' manner and it might 
take a quite long time to determine the value although they 
did not mention this point explicitly in their paper. 
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Fig. 5. Time dependence of the performance measures K (upper panel) and 
L (lower panel). We plot the average values of K and L over 20-independent 
runs for various different choices of the successive two video-frames. 
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Fig. 6. Computational time (real CPU time) CT [sec] until the algorithm 
converges as a function of system size A'^. 



C. Simple MCMC approach 

In general, the preciseness of the mean-field approximation 
is not so good. Here we attempt to use simple MCMC instead 
of hybridization of mean-field approximation and MCMC 
to calculate the expectations of quantities appearing in the 
learning equations over the posterior. Then, we compare the 
results with those obtained by hybridization of the mean- 
field approximation and the MCMC discussed in the previous 
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Fig. 7. Typical snapshots of velocity fields obtained by the method of simple 
MCMC at time t = (upper left)C t = 10 (upper right), t = 20 (lower left), 
t = 30 (lower right)C The case of t = corresponds to the result by Zhang 
and Hanouer (1995). 



subsection. 

We show the results in Fig. [T] From these panels, we find 
that the resultant velocity fields at i = 30 are much closer to 
the true fields than the result obtained by the hybridization. 

We also evaluate the performance measures K, L and com- 
pare the results with the results by the hybridization of mean- 
field approximation and MCMC in Fig. [8] From these two 
panels, we find that at the initial stage of the learning steps, the 
hybridization decreases the two kinds of errors very quickly, 
however, eventually the errors are saturated. On the other hand, 
the errors by the simple MCMC does not decreases so quickly 
at the initial stage, however, the resultant errors converge to 
lower values than those of the hybridization. 

We also compare the computational time until the con- 
vergence for hybridization and simple MCMC. The result is 
shown in Fig.|9] From this figure, we notice that the hybridiza- 
tion takes much longer time to proceed than the simple MCMC 
does because the Monte Carlo steps in the MCMC for each 
learning step t are the same as the hybridization. 

Finally we list the table to compare the hyper-parameters 
obtained by our methods and by Zhang and Hanouer (1995). 
We show the result in TABLE |I] This table tells us that 





Zhang and Hanouer (1995) 


Hybridization 


simple MCMC 




2 


2.3 


2.5 


B 


5 


12.1 


11.7 


Ad 


2.5 


2.7 


2.8 


Pd 


4 


3.8 


3.7 


ai 


200 


232 


220 


Ts 


5 


5 


5 



TABLE I 

Comparison of the resultant hyper-parameters. 



several parameters in Zhang and Hanouer (1995) are very 
close to ours or exactly the same as ours, however, some of 
the parameters are quite far from our results. This means that 
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Fig. 8. The Euler step dependence of K (upper panel) and L (lower panel) 
for the hybridization (solid line) and simple MCMC (broken line). 



CT 




Fig. 9. Computational time (real CPU time) CT [sec] for the hybridization 
(solid line) and simple MCMC (broken line) as a function of system size A'^. 



the ad-hoc choice by Zhang and Hanouer (1995) is statistically 
(theoretically) incorrect and if one needs to choose statistically 
'proper' hyper-parameters 'systematically', he (or she) should 
utihze the procedures provided by us in this paper. 



simultaneously in consecutive digital images described by 
spatio-temporal Markov random fields. We found that our 
motion estimation is much better than the result obtained by 
Zhang and Hanouer (1995) in which the hyper-parameters are 
set to some ad-hoc values without any theoretical justification. 

Utilization of EM algorithm to determine the hyper- 
parameters by maximizing the marginal likelihood indirectly 
lfT2l . fT3l, analytical evaluation of the average-case perfor- 
mance by making use of mathematically solvable MRPs 
such as Gaussian MRPs lfT4l or infinite range MRFs lfT2ll . 
applying the Belief propagation fTsl to compute the marginal 
probability in our framework are now on going and the results 
will be reported in the conference or elsewhere. 
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VII. Summary 

In this paper, we numerically examined a Bayesian mean- 
field approach with the assistance of the MCMC method 
to estimate motion velocity fields and probabilistic models 



